Text binarization in color documents
نویسندگان
چکیده
This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or grayscale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. VC 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/
منابع مشابه
Font and Background Color Independent Text Binarization
We propose a novel method for binarization of color documents whereby the foreground text is output as black and the background as white regardless of the polarity of foreground-background shades. The method employs an edge-based connected component approach and automatically determines a threshold for each component. It has several advantages over existing binarization methods. Firstly, it can...
متن کاملAn Analysis of Image Binarization Techniques for Natural Scene Images
Text extraction from natural scene images is an emerging field in computer graphics. Extracted text contains important information that can be used for various purpose like vehicle number plate detection to identify the vehicle, to provide information of surrounding to visually impaired persons, preservation of information of historical documents etc. Binarization is a key process in text extra...
متن کاملDetecting Text in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Color Invariance
In this paper, we propose a novel method for detecting and segmenting text layers in complex images. This method is robust against degradations such as shadows, non-uniform illumination, low-contrast, large signaldependent noise, smear and strain. The proposed method first uses a geodesic transform based on a morphological reconstruction technique to remove dark/light structures connected to th...
متن کاملAColDPS - Robust and Unsupervised Automatic Color Document Processing System
This paper presents the first fully automatic color analysis system suited for business documents. Our pixelbased approach uses mainly color morphology and does not require any training, manual assistance, prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several...
متن کاملAn Enhancement of Images Using Recursive Adaptive Gamma Correction
The “Adaptive Approach for Historical or Degraded Document Binarization” is that in which Libraries and Museums obtain in large gathering of ancient historical documents printed or handwritten in native languages. Typically, only a small group of people are allowed access to such collection, as the preservation of the material is of great concern. In recent years, libraries have begun to digiti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Imaging Systems and Technology
دوره 16 شماره
صفحات -
تاریخ انتشار 2006